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This paper deals with empirical processes of the type 

C n (B) = v^WB) - P{X n+1 SB |Xl,.. .,*„)}, 

where (X n ) is a sequence of random variables and /u n = (l/ n ) SILi $ x i ^ ne empirical measure. 
Conditions for sup s |C n (S)| to converge stably (in particular, in distribution) are given, where 
B ranges over a suitable class of measurable sets. These conditions apply when (X n ) is exchange- 
able or, more generally, conditionally identically distributed (in the sense of Berti et al. [Ann. 
Probab. 32 (2004) 2029-2052]). By such conditions, in some relevant situations, one obtains that 
sup B |CVi(-B)| — > or even that y'nsup s |C„(S)| converges a.s. Results of this type are useful in 
Bayesian statistics. 

Keywords: Bayesian predictive inference; central limit theorem; conditional identity in 
distribution; empirical distribution; exchangeability; predictive distribution; stable convergence 

1. Introduction and motivations 

A number of real problems reduce to the evaluation of the predictive distribution 

a n (-)=P(X n+1 e-\X 1 ,...,X n ) 

for a sequence X\, Xi, ... of random variables. Here, we focus on those situations where 
a„ cannot be calculated in closed form and one decides to estimate it based on the 
available data X%, . . . , X n . Related references are [1-3, 5, 6, 8, 10, 15, 18, 20]. 



This is an electronic reprint of the original article published by the ISI/BS in Bernoulli, 
2009, Vol. 15, No. 4, 1351-1367. This reprint differs from the original in pagination and 
typographic detail. 



1350-7265 © 2009 ISI/BS 



1352 



Berti, Crimaldi, Pratelli and Rigo 



For notational reasons, it is convenient to work in coordinate probability space. Ac- 
cordingly, we fix a measurable space {S,B) and a probability P on (5°° ,B°°), and we let 
X n be the nth canonical projection on (S°°, B°°, P), n> 1. We also let 

g n = a(X 1 ,...,X n ) and X = (X U X 2 , . . .). 

Since we are concerned with predictive distributions, it is reasonable to make some 
(qualitative) assumptions about them. In [6], X is said to be conditionally identically 
distributed (c.i.d.) when 

E(I B (X k )\g n ) = E(I B {X n+1 )\g n ) a.s. for all B £ B and k > n > 0, 

where Qq is the trivial cr-field. Thus, at each time n > 0, the future observations 
(Xk'-k > n) are identically distributed given the past Q n . In a sense, this is a weak 
form of exchangeability. In fact, X is exchangeable if and only if it is stationary and 
c.i.d., and various examples of non-exchangeable c.i.d. sequences are available. 

In the sequel, X = (X\,X2, • ■ •) is a c.i.d. sequence of random variables. 

In that case, a sound estimate of a n is the empirical distribution 

1 " 

A*n = - S Xi ■ 

n ^— ' 

i=i 

The choice of \i n can be defended as follows. Let V C B and let || • || denote the sup-norm 
on T>. Suppose also that T> is countably determined, as defined in Section 2. (The latter 
is a mild condition, only needed to handle measurability issues.) Then 

1 1 /in - On || = sup \n n (B) - a n {B)\ 0, (1) 

B£T> 

provided {X is c.i.d. and) u„ converges uniformly on T> with probability 1; see [5]. For 
instance, ||/i„ — a„|| -^4 whenever X is exchangeable and I? is a Glivenko-Cantelli class. 
Also, ||/i„ — a n \\ - a ^> if S = M, V = {(— oo,i] :t G R}, and X\ has a discrete distribution 
or inf e>0 liminf n P(|A„ + i - X n \ <e)=0; see [4]. 

To sum up, under mild assumptions, //„ is a consistent estimate of a n (with respect 
to uniform distance) for c.i.d. data. This is in line with de Finctti [10] in the particular 
case of exchangeable indicators. 

Taking (1) as a starting point, the next step is to investigate the convergence rate, 
that is, to investigate whether a n ||/i n — a„|| converges in distribution, possibly to a null 
limit, for suitable constants a n > 0. This is precisely the purpose of this paper. 

A first piece of information on the convergence rate of ||/i„ — a„|| can be obtained as 
follows. For B £ B, define 

fi(B) = limsup/z„(B), 



W n (B) = V^{^n(B)-^B)}. 
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By the SLLN for c.i.d. sequences, n n {B) A*(-B); see [6]. Hence, for fixed n > and 
B G B, one obtains 

k 

E{fi(B)\g n )=]imE(ti k (B)\g n )=]im^ ^ E(I B (Xi)\g n ) 

i—n-^l 

= E(I B (X n+1 )\g n ) = a n (B) a.s. 
In turn, this implies that y/n{/i n (B) — a n (B)} — E(W n (B)\Q n ) a.s., so 

\\t-Ln-a n \\ < 4= sup E(\W n (B)\\g n ) < -LE(\\W n \\\g n ) a.s. 

If sup n E|| W n \\ k < oo for some k > 1, it then follows that 

£{K||/i„ - a„||)*} < #|| W n \\ k ->■ whenever ^ -> 0. 

VV'V V" 

Even if obvious, this fact is potentially useful since 

sup -E 1 1 W n |j fc < oo for all k > 1, if X is exchangeable, (2) 

n 

for various choices of T>; see Remark 3. In particular, (2) holds if T> is finite. 

The intriguing case, however, is a n = y/n. For each B € B and probability Q on 
(S 100 ,/? 00 ), write 

CS(B)=E Q (W n (B)\g n ) and 

<7 n (B) = Cl (B) = V^{^n(B) - a n (B)}. 

In Theorem 3.3 of [6], the asymptotic behavior of C n {B) is investigated for fixed B. Here, 
instead, we are interested in 

||C n || = sup \C n (B)\ = s/n\\fi n - a n \\. 

B£T> 

Our main result (Theorem 1) is the following. Fix a random probability measure iV 
on R and a probability Q on (iS 100 ,/? 00 ) such that 

\\C%\\ -> N stably under Q and 
||W n || is uniformly integrable under both P and Q. 

Then, 

||C„|| -> N stably whenever P < Q. (3) 
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A remarkable particular case is N = Sq. Suppose, in fact, that for some Q, one has 
\\C®\\ % and ||W n || uniformly integrable under P and Q. Then, 

|| C„ || -4- whenever P < Q. 

Stable convergence (in the sense of Renyi) is a stronger form of convergence in distri- 
bution. The definition is recalled in Section 2. 

In general, one cannot dispense with the uniform integrability condition. However, 
this condition is often true. For instance, ||W n || is uniformly integrable (under P and Q) 
provided V meets (2) and X is exchangeable (under P and Q). 

To make (3) concrete, a large list of reference probabilities Q is needed. Various exam- 
ples are available in the Bayesian nonparametrics framework; see, for example, [16] and 
references therein. The most popular is perhaps the Ferguson-Dirichlet law, denoted by 
Qq. If P = Qo, then X is exchangeable and 

, , aP(Xx€B)+nun(B) 

a n (B) = a.s. for some constant a > 0. 

a + n 

Since || //„ — a n \\ < (ct/n) when P = Qo , something more than \\C n \\ — > can be expected 
in the case P <C Qo- Indeed, we prove that 

n\\^n — a™ || = \Ai||C„|| converges a.s. 

whenever P <C Qo with a density satisfying a certain condition; see Theorem 2 and 
Corollary 5. 

One more example should be mentioned. Let X n = (Y n , Z n ), where Z n > and 

P( y, tieB|W = ?ffili^pk5W3 ,., 

for some constant a > 0. Under some conditions, X is c.i.d. (but not necessarily ex- 
changeable), HWnll is uniformly integrable and ||C n || converges stably; see Section 4. 

The above material takes a nicer form when the condition P <C Q can be given a 
simple characterization. This happens, for instance, if S = {x\ 1 . . . , Xk, Xk+i} is finite, X 
exchangeable and P(X\ = x) > for all x g S. Then, P <C Qo (for some choice of Qo) if 
and only if 

(/i{afi},...,/i{a5 fc }) 

has an absolutely continuous distribution with respect to Lebesgue measure. In this 
particular case, however, a part of our results can also be obtained through the Bernstein- 
von Mises theorem; see Section 3. 
Finally, we make two remarks: 

(i) If X is exchangeable, our results apply to Bayesian predictive inference. Suppose, 
in fact, that S is Polish and B the Borel c-field, so that de Finetti's theorem applies. Then, 
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P is a unique mixture of product probabilities on B°° and the mixing measure is called the 

prior distribution in a Bayesian framework. Now, given Q, P ^ Q is just an assumption 

on the prior distribution. This is plain in the last example where S = {xi, . . . ,Xk,Xk+i}- 

In Bayesian terms, such an example can be summarized as follows. For a multinomial 
p 

statistical model, ||C n || — > if the prior is absolutely continuous with respect to Lebesgue 
measure, and V^ll^nll converges a.s. if the prior density satisfies a certain condition. 

(ii) To our knowledge, there is no general representation for the predictive distributions 
of an exchangeable sequence. Such a representation would be very useful. Even if only 
partially, results like (3) contribute to filling the gap. As an example, for fixed B £ B, 
one obtains a n (B) = [i n (B) +op(-^=), provided X is exchangeable and P<C Q for some 

Q such that C$(B) % and W n {B) is uniformly integrable. 

2. Main results 

A few definitions need to be recalled. Let T be a metric space, Bt the Borel a-field on 
T and (fl, A, P) a probability space. A random probability measure on T is a mapping 
N on f2 x Bt such that: (i) N(w, •) is a probability on Bt for each u £ f2; (ii) N(-,B) 
is .A-measurable for each B £ Bt- Let {Z n ) be a sequence of T- valued random variables 
and JV a random probability measure on T. Both (Z n ) and N are defined on (il,A, P). 
We say that Z n converges stably to ./V in the case where 

P(Z n £ -|.ff) -> E(N(-)\H) weakly for all H G A such that P(H) > 0. 

Clearly, if Z n — > N stably, then Z n converges in distribution to the probability law 
E(N(-)) (just let H = f2). Stable convergence has been introduced by Renyi in [17] and 
subsequently investigated by various authors; sec [9] for more information. 

Next, we say that T> C B is countably determined in the case where, for some fixed 
countable subclass T> C T>, one obtains sup BgX , Q \v\{B) — v 2 {B)\ = sup BeX) \v\{B) — 
v<i (B) | for every pair ^1,^2 of probabilities on B. A sufficient condition is that for some 
countable T>q C T>, and for every e > 0, B £ D and probability v on B, there is Bo £ T>o sat- 
isfying v(B/S.Bq) < e. Most classes D involved in applications are countably determined. 
For instance, T> = {(—00, t] : t £ M. k } and V = {closed balls} are countably determined if 
S = M. k and B is the Borel a-field. As another example, T> = B is countably determined 
if B is countably generated. 

We are now in a position to state our main result. Let N be a random probability 
measure on KL defined on the measurable space (S 00 ^ 00 ), and let Q be a probability on 
(S* 00 ,/? 00 ). 

Theorem 1. Let D be countably determined. Suppose ||C^| — > N stably under Q, and 
{\\W n \\ :n> 1) is uniformly integrable under P and Q. Then, 

\\C n \\ = Vn\\f-n — «n| — >• N stably whenever P <Q. 
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Proof. Since T> is countably determined, there are no measurability problems in taking 
sup Be v . In particular, ||W„|| and ||C n || are random variables and ||C„|| is Q n -measurable. 
Let / be a version of ^ and U n = f — Eq{f\Q n ). Then, 

E Q (jW n {B)\g n ) 



C n {B) = E{W n {B)\g n ) = 



E Q {f\Qn) 



E Q (U n W n (B)\g n ) 

B)+ v , P-a.s., for each B e B. 

EQ(j\y n ) 

Letting M„ = ^^^'l^g")^"" 1 and taking sup BeX) , it follows that 

\\CS\\-M n < \\C n \\ < \\C$\\+M n , P-a.s. 

We first assume / to be bounded. Since \\C®\\ — > N stably under Q, given a bounded 
random variable Z on (S* 00 , one obtains 

0(||C£||)ZdQ— ► J N{4>)ZAQ 

for each bounded continuous </>:R — > R, where N((j>) = J (f)(x)N(-,dx). 

Letting Z = fI H /P(H) with H e B°° and P(i?) > 0, it follows that \\C%\\ -> N stably 
under P. Therefore, it suffices to prove that EM n — > 0. Given e > 0, since is 
uniformly integrable under Q, there exists some c > such that 

E Q{\\W n \\I{\\w n \\>c}} < ~^— f for all n. 

sup J 

Since M n is (^-measurable, 

EM n = E Q (fM n ) = E Q (E Q (f\g n )M n ) = E Q (\U n \\\W n \\) 

< cE Q \U n \ + (supf)E Q {\\W n \\I {{lWnl>c} ) < cE Q \U n \+s for all n. 

Therefore, the martingale convergence theorem implies that 

lim sup EM n < c lim sup Eq \U n \ + e = e. 

n n 

This concludes the proof when / is bounded. 

Next, let / be any density. Fix k > such that P(f < k) > and define K = {/ < k] 
and Pk{ ) = P(-\K). Then, Pk has the bounded density fIx/P(K) with respect to Q. 
By what has already been proven, \\C^ k | — >• N stably under Pk, where 

C?(B) = E PK (W n (B)\g n ) = E{I E^f n \ 
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E{R„W n (B)\g n } 
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E{I K \\C n -C? \\} = E\l K su V 

BGT> 



< 



E(I K \Gn) 

< cE\R n \ + E{\\W n \\I { \\ Wn \ l>c} } for all c> 0. 
Since -E|P n | — > and || W n \\ is uniformly integrablc under P, arguing as above gives that 



Ep K \\\C n \\-\\C^\\\< 



P ,,.E{I K \\C n -C^\\} 



o. 



Therefore, ||C„|| -> N stably under P K . Finally, fix H e P{H) > and a bounded 
continuous function <j) : K ->• R. Then P(PT n K) = P(H n {/ < fc}) > for fc sufficiently 
large and 

P(H)\E(ct>(\\C n \\)\H)-E(N(ct>)\H)\ 

< 2sup|^|P(/ > fc) + |£7(0(||<7 n ||)|JJ n/v) - E(N(<j))\HnK)\. 

Since £;((/)(||C*„||)|Prn/v) -> E(N{cj))\HnK) asn^oo and P(/ > fc) -> as fc -> oo, this 
concludes the proof. □ 

Next, we deal with the particular case Q = Qo, where Qo is a Ferguson-Dirichlet law 
on (S 00 ^ 00 ). If P <C Qo with a density satisfying a certain condition, the convergence 
rate of ||/x„ — a„|| can be remarkably improved. 

Theorem 2. Suppose T> is countably determined and sup ra pQ ||W Il || 2 < oo. Then, 
y/n\\C n \\ = n\\ fjL n — a n \\ converges a.s., provided P <C Qo an d 

dP 



E Q (f ) ~ E Qo {E Qo {f\Qn) } = O - for some version f of 



dQo 

Proof. Let D n (B) = y/nC n (B). Then, ||D n || is Q n -measurable (as T> is countably deter- 
mined) and 



E{\\D n+l \\\g n ) = E\ sup 

\B£T> 



n+1 



> sup 
bgt> 



sup 

Bev 



J2 lB{Xi) - (n + l)E(f,{B)\g n+1 ) 
i=i 

n+l \ 

Si Y^ J B(Xi)\Gn )-(n+ l)E(fi(B)\g n ) 



Qn 



,i=l 



^/fl(JTi)-tiE(^(B)|S n ) 



= ||AJ a.s. 
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Since ||-D n || is a C/„-submartingale, it suffices to prove that sup n ^HZ^H < oo. 

Let U n = f — Eo(f\Gn), where Eq stands for Eq . By assumption, there exist cj., c% > 
such that 

£o||W n || 2 <ci, nE U^n{E (f 2 )-E Q (E (f\g n ) 2 )}<c 2 for all n. 
As noted in Section 1, since Qq is a Fcrguson-Dirichlet law, there is an a > such that 
y/n\\C$°\\=yftisup\Eo(Wn(B)\g n )\<a for all n. 

B£T> 

Define M n = MMj^Hlgn) and recall that \\ c j < \\ C Q \\ + M ^ p. a S- . see the proof 
of Theorem 1. Then, for all n, one obtains 

E\\D n \\ = V^||C„|| < V^(E\\Cg°\\+EM n )<a + V^E (fMn) 
= a + V^E (\U n \\\ W n \\) <a + V^V E oU%E \\W n \\ 2 
< a + y ciuEqU 2 <a + \fc\oi. □ 

Finally, we clarify a point raised in Section 1. 

Remark 3. There is a long list of (countably determined) choices of T> such that 

sup£||VK„|| fc < c(k) for all k > 1, if X is i.i.d., 

n 

where c(k) is some universal constant; see, for example, Sections 2.14.1 and 2.14.2 of [21]. 
Fix one such V, k>l, and suppose that S is Polish and B is the Borel a- field. If X is 
exchangeable, then dc Finetti's theorem yields £'(||W n || |T) < c(k) a.s. for all n, where T 
is the tail a-field of X. Hence, E\\ W n \\ k = E{E{\\ W n \\ k \T)} < c\k) for all n. This proves 
inequality (2). 

3. Exchangeable data with finite state space 

When X is exchangeable and S finite, there is some overlap between Theorem 1 and a 
result of Bernstein and von Miscs. 

3.1. Connections with the Bernstein— von Mises theorem 

For each 9 in an open set O C let Pg be a product probability on (S°° ,B°°) (that is, 
X is i.i.d. under Pg). Suppose the map 9 i— > Pe{B) is Borel measurable for fixed B € B°° . 
Given a (prior) probability it on the Borel subsets of 0, define 



P(B)= / P e (B)n(d9), BeB 
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Roughly speaking, the Bcrnstein-von Mises (BvM) theorem can be stated as follows. 
Suppose 7r is absolutely continuous with respect to Lebesgue measure and the statistical 
model (Pg : 9 e 9) is suitably "smooth" (we refer to [13] for a detailed exposition of what 
"smooth" means). For each n, suppose that 9 admits a (consistent) maximum likelihood 
estimator 9 n . Further, suppose the prior 7r possesses the first moment and denote by 9* n 
the posterior mean of 9. Then, 

- o* n ) ^ 

for each 9q € Q such that the density of n is strictly positive and continuous at 9q . 

Actually, the BvM theorem yields much more than asserted; what is reported above is 
just the corollary connected to this paper. We refer to [13] and [14] for more information 
and historical notes; see also [18]. 

Assuming a smooth, finite-dimensional statistical model is fundamental; see, for ex- 
ample, [11]. Indeed, the BvM theorem does not apply when the only information is that 
X is exchangeable (or even c.i.d.) and P <C Q for some reference probability Q. One 
exception, however, is when S is finite. 

Let us suppose 

S = {x\, . . . ,Xk,Xk+i}, A is exchangeable, P(Xi = x) > 
for all x e 5 and V = B = power set of S. 

Also, let A denote Lebesgue measure on R fc and it the probability distribution of 

9= (^{xi},...,^{x fe }). 

As noted in Section 1 , tt -^i A if and only if P <C Qo for some choice of Qo . Since T> 
is finite and X exchangeable under P and Qo, is uniformly integrable under P 

and Qo- Thus, Theorem 1 yields ||C„|| — > whenever 7r <S A. On the other hand, 7r is 
the prior distribution for this problem. The underlying statistical model is smooth and 
finite-dimensional (it is just a multinomial model). Further, for each n, the maximum 
likelihood estimator and the posterior mean of 9 are, respectively, 

9 n = {p-n{xi}, . . . , H»{x k }), 0* n = (a n {xi}, a n {x k }). 

Thus, the BvM theorem implies that ||C„|| — > 0, provided n <C A and the density of n is 
continuous on the complement of a 7r-null set. 

To sum up, in this particular case, the same conclusions as from Theorem 1 can be 
drawn from the BvM theorem. Unlike the latter, however, Theorem 1 does not require 
any conditions on the density of tt. 

3.2. Some consequences of Theorems 1 and 2 

In this subsection, we focus on S = {0, 1}. Thus, T> = B = power set of S and A denotes 
Lebesgue measure on K. Let Af(0, a) denote the one-dimensional Gaussian law with mean 
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and variance a > (where 7V(0,0) = Sq). Our first result allows ir to have a discrete 
part. 

Corollary 4. With S = {0, 1}, let 7r be the probability distribution of and 

A = {9 £ [0, 1] : tt{6»} > 0}, A = {a; G S°° : n(w,{l}) £ A}. 
Define the random probability measure N on R as 

N= (1 - 7 A )5 + 7x^(0, /x{1}(1 - /i{l}))- 
7/ X is exchangeable and tt does not have a singular continuous part, then 

C n {l}^N stably and \\C n \\ -> N o ft -1 s£a%, 
where h(x) = \x\, x £ R, is i/ie modulus function. 

Proof. By standard arguments, the corollary holds when 7r(A) £ (0, 1), provided it holds 
when 7r(A) = and 7r(A) = 1. Let 7r(A) = 0. Then, tt <C A as tt does not have a singular 
continuous part, and the corollary follows from Theorem 1. Thus, it can be assumed 
that 7r(A) = 1. Since C„{0} = — C„{1}, ||C rl | = |C„{1}| and the modulus function is 
continuous, it suffices to prove that C„{1} —> N stably. 

Next, exchangeability of X implies that W„{1} — > 7V(0, — stably; see, 

for example, Theorem 3.1 of [6]. Since 7r(A) = 1, we have N = Af(0, — a.s. 
Hence, it is enough to show that -E|C n {l} — W„{1}| — s- 0. 

Fix e > and let M n = W n {l}. Since X is exchangeable, M n is uniformly integrable. 
Therefore, there exists some c > such that 

£ 

sup£'(|M„|/ { | Mn | >c} ) < -. 

n 4 

Define <p(x) = x if |x| < c, = c if x > c, and = — c if x < — c. Since C„{1} = 
-Et-Mral^n) a.s., it follows that 

E\C n {\} - W n {l}\ < E\E(M n \g n ) ~ E{</>(M n )\g n )\ 

+ E\E(<l>(M n )\g n ) - HM n )\ + E\<t>(M n ) - M n \ 

< E\E{<j>{M n )\Q n )-ct>{M n )\ + AE{\M n \I { \ Mn>c} ) 

< E\E{(j){M n )\g n ) - <t>{M n )\ + £ for all n. 

Write A = {ai,a 2 , . . .} and M n j = y/ri((J. n {l} — a,j). Since a(M n j) C Q n and P(/i{l} £ 
A) = 7r(A) = 1, one also obtains 

E\E{4>{M n )\g n ) - 4>{M n )\ 

= Y,E\E(<t>(Mn, 3 )I M l } =a j} \Gn) ~ <f>{ M nJ 1}=a . } | 

3 
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52E\<f>(M nd ){P0 i {l} = a j \g n )-lMi}=a i }}\ 



3 



in 



< c 



c^B|P(/i{l} = aj \Q n ) - / {M{ i}=«,}| + 2c ]T 7r{a,} for all 



m, n. 



3 = 1 j>m 



By the martingale convergence theorem, E\P([i{l} = a,j\Q n ) ~ I{[i{i}=aj} \ — > as n—> oo, 
for each j. Thus, 



If 7r is singular continuous, we conjecture that C n {l} converges stably to a non-null 
limit. However, we do not have a proof. 

In the next result, a real function g on (0, 1) is said to be almost Lipschitz in the case 
where x i-> g(x)x a (l — x) b is Lipschitz on (0, 1) for some reals a, b < 1. 

Corollary 5. Suppose S = {0, 1}, X is exchangeable and 7r is the probability distribution 
o//x{l}. If iv admits an almost Lipschitz density with respect to X, then y/n\\C n \\ converges 
a.s. to a real random variable. 

Proof. Let V = /i{l}. By assumption, there exist a, b < 1 and a version g of such 
that (f)(9) = g(9)6 a (l — 9) b is Lipschitz on (0, 1). For each Ui,U2 > 0, we can take Qo such 
that V has a beta-distribution with parameters u\,U2 under Qo- Let Qo be such that V 
has a beta-distribution with parameters u± = 1 — a and 1/2 = 1 — b under Qo- Then, for 
any n > 1 and aii, . . . , x n € {0, 1}, one obtains 

P{X 1 =xx,...,X n =x n ) 



limsup£;|C„{l} - W n {l}\ <e + 2cY^ ^{aj} 



for all m. 



Taking the limit as m — > 00 completes the proof. 



□ 






n 



Let h = c<p. Then, h is Lipschitz and / = h(V) is a version of 

Let y„ = Eo(V\Q n ), where Eq stands for Eq . Since h is Lipschitz, 



1/ - E (f\g n )\ < \h(V) - h(V n )\ +E (\h(V) - h(V n )\\Qn) 



<d\V-V n \+dE (\V-V n \\g n ), 
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where d is the Lipschitz constant of h. Since £7o 1 1 1 1 2 < -E'o||W n || 2 and 

My -v n \ = |c£°{i} - w n {i}\ < \\c2°\\ + \\w n \\, 

it follows that 

E (f 2 ) - E Q (E a (f\g n ) 2 ) = E {(f - E Q (f\g n )) 2 } < 4d 2 E Q {(V - V n ) 2 } 

Ad 2 1 fW 2 

< ^E {(\\C%°\\ + \\ Wn \\) 2 } < —E \\W n \\ 2 - 
n n 

Since sup^-EoHWnH 2 < oo, we have E (f 2 ) — E (E (f\g„) 2 ) = 0(l/n). An application of 
Theorem 2 completes the proof. □ 

Corollaries 4 and 5 deal with S = {0, 1}, but similar results can be proven for any finite 
S; see also [12] and [19]. 

4. Generalized Polya urns 

In this section, based on Examples 1.3 and 3.5 of [6], the asymptotic behavior of ||C n || 
is investigated for a certain c.i.d. sequence. 

Let (y,By) be a measurable space, B+ the Borel cr-field on (0,oo) and 

S = ^x(0,oo), B = B y ®B+, X n = (Y n ,Z n ), 

where Y n (u>) = y n ,Z n (u;) = z n for all u = (y ± , z±,y 2 , z 2 , ■ ■ ■) € 5°°. 

Given a law P on B°°, it is assumed that 

P(Y n+1 e B\Q n ) = „ 1 L a.s., n > 1, (4) 

P{Z n+1 GC\X 1 ,...,X n ,Y n+1 )=P{Z 1 GC) a.s.,n>0, (5) 

for some constant a > and all B G By,C E B+. Note that (Z n ) is i.i.d. and Z n+ \ is 
independent of ( Y\ , Z\ , . . . , Y n , Z„ , Y^+i) for all n > 0. 

In real problems, the Z„ should be viewed as weights, while the Y n describe the phe- 
nomenon of interest. As an example, consider an urn containing white and black balls. 
At each time n > 1, a ball is drawn and then replaced together with Z n more balls of 
the same color. Let Y n be the indicator of the event {white ball at time n} and suppose 
that Z n is chosen according to a fixed distribution on the integers, independently of 
(Yi,Zi,.. . ,Y n -i, Z n -i, Y n ). The predictive distributions of X are then given by (4)-(5). 
Also, note that the probability law of (Y n ) is Ferguson-Dirichlet in the case where Z n = 1 
for all n. 

It is not hard to prove that X is c.i.d. We state this fact as a lemma. 



Lemma 6. The sequence X assessed according to (4)-(5) is c.i.d. 
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Proof. Fix k > n > and A g By ® £>+. By a monotone class argument, it can be 
assumed that A — B x C, where B g By and C g B+. Further, it can be assumed that 
k = n + 2. Let n = and Qq be the trivial a- field. Since X 2 ~ Jfi (as is easily seen), 
S(J B (y 2 )/ c (Z 2 )|a ) = £(Ji?(yi)/ C (^i)|ao) a.s. If n > 1, define 

S* n = 0~{X\,. . .,X n , Z n+ i). 

Noting that E{I B (Y n+1 )\Gl) = E(I B (Y n+1 )\g n ) a.s., one obtains 

E(i B (Y n+2 )\g*) = E{E(i B (Y n+2 )\g n+1 )\g* n } 

_ aP{Yx g B) + Y™=i Z l I B {Y i ) + Z n+1 E{I B (Y n+l )\g* n ) 



01 + Ei=i z ^ 

_ (<* + E?=i ^« WB(y»+i)|g») + z n+1 E(i B (Y n+1 )\g n ) 
+ E i= i ^ 

= £7(J B (y n+1 )|a n ) = S(/B(r„+i)|^) a.s. 
Finally, since C £7 * , the previous equality implies that 

E(I B (Y n+2 )I c (Z n+2 )\g n ) = P{Z 1 g ^^(JsCFn+a)!^)!^} 

= g c)E{E(i B (Y n+1 )\g* n )\g n } 

= E(I B (Y n+1 )I c (Z n+1 )\g n ) a.s. 
Therefore, X is c.i.d. □ 

Usually, one is interested in predicting Y n more than Z n . Thus, in the sequel, we focus 
on P(Y n+ i g B\g n ). For each B g By, we write 

O n {B) = C n {Bx (0,oo)), a n (B) = a n (Bx {0, oo)) = P(Y n+1 € B\g n ), 

and so on. 

In Example 3.5 of [6], assuming SZj 2 < oo, it is shown that 

C n (B) ->Af(0,<ri) stably, where a| = ^^ M (5)(l - n(B)). 

Here, we prove that C n converges stably when regarded as a map C n : S°° — > l°°(T>), where 
Z°°(Z>) is the space of real bounded functions on 2? equipped with uniform distance; see 
Section 1.5 of [21]. In particular, stable convergence of C n as a random element of Z°°(2?) 
implies stable convergence of ||C n || = sup BgI , |C n (.B)|. 

Intuitively, the stable limit of C n (when it exists) is connected to the Brownian bridge. 
Let Bi, B 2l . . . be pairwise disjoint elements of By and 

k 

V = {B k x (0,oo) :&>!}, T = 0, T fe =^/i(B 4 )- 
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Also, let G be a standard Brownian bridge process on some probability space (V, 0} Ao,Po)- 
For fixed weS°°, 



L(u,B k ) = &M {G{T k {u)) G(Tfc_i(w))} 

is a real random variable on (Qq,Ao,Po)- Since the B k are pairwisc disjoint and G has 
continuous paths, L{u),Bk) — > as k — > oo. It thus makes sense to define M(w, •) as the 
probability distribution of L(uS) = (L(u>, B\), L{uj, B2), ■ ■ .), that is, 

M{u,A) = Pq{L{uj) e A) for each Borel set A C J°°(X>). 

Similarly, let iV(w, ■) be the probability distribution of sup fc>1 \L(uj,Bf.)\, that is, 



N(oj, A) = P [ sup \L(uj,B k )\ e A) for each Borel set 4cK. 

Theorem 7. Suppose B\,B2,--. G Sj; are pairwise disjoint and T>, M, N are defined 
as above. Let X be assessed according to (4)-(5) with a < Z± <b a.s. for some constants 
< a < b. Then, 



supE\\W n \\ 2 <cJP[Y 1 e\jB k ) (6) 



k 



for some constant c independent of the B k , and C n M stably (in the metric space 
l°°(p)). In particular, \\C n \\ ^ N stably. 

Let Qi denote the probability law of a sequence X satisfying (4)-(5) and a<Z\<b a.s. 
In view of Theorem 7, Q\ can play the role of Q in Theorem 1. That is, for an arbitrary 
c.i.d. sequence X with distribution P, one has ||C„|| — > N stably, provided P -C Qi and 
||W„|| is uniformly intcgrable under P. The condition of pairwise disjoint B k is actually 
rather strong. However, it holds in at least two relevant situations: when a single set B 
is involved, and when S = {xi,X2, . . .} is countable and Bk = {x^} for all k. 

Proof of Theorem 7. This proof involves some simple but long calculations. Accord- 
ingly, we provide only a sketch of the proof and refer to [7] for details. 

Since X is c.i.d., for fixed B £ By, one has a n (B) = E(n(B)\Q n ) a.s. Hence, (a„(B) : n > 
1) is a Q n -martingale with a n (B) n(B) and this implies that 

E{(a n+1 (B) M (S)) 2 } = El fe( 0j -(B) - a j+1 (B))\ } = £>{( aj (B) - a ]+1 (B)f}. 

Replacing aj(B) by (4) and using the fact that a < Z% < b a.s. for all i, a long but 
straightforward calculation yields J2j> n E{(aj(B) — a J+ i(S)) 2 } < ^P(Yi £ B), where 
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c\ is a constant independent of B. It follows that 

£||a„+i - m|| 2 = £{sup(a„+i(B fe ) - fi(B k )f} < ^£{(a„ +x (.B fc ) - v(B k )) 2 } 
k k 

= EE^(°i( B *) -%+i(^)) 2 } < -Ew e 



fe j > n 



p(yie\Jb^J asthcBfc are pairwise disjoint. 



k 

Precisely as above, after some algebra, one obtains 



E\\Hn - a n+1 \\ 2 < -^^[Yi e \jB k 
for some constant C2 independent of f?i,i?2, ■ ■ ■ ■ Therefore, 



E\\W n \\ 2 = nE\\fi n - ^H 2 < 2nE\\n n - a„+i|| 2 + 2n£||a n+ i - a*|| 2 <cJp[Y 1 e\jB 



5fc 

fe 

where c = 2(ci + c%). This proves inequality (6). 

It remains to prove that C n —> M stably (in the metric space l°°(T>)). For each m > 1, 
let S m be the to x to matrix with elements 

*fc,i = n B i) - KBk)^)), k, j = 1, . . . , to. 

By Theorems 1.5.4 and 1.5.6 of [21], for C n — > M stably, it is enough that: 

(i) (hnitc-dimcnsional convergence): 

(C n {Bi), . . . ,C n (B m )) ^ Af m (0,E m ) stably for each to > 1, 
where Af m (0, S m ) is the m-dimcnsional Gaussian law with mean and covariancc matrix 

(ii) (asymptotic tightness): for each e, 6 > 0, there exists some to > 1 such that 

limsupP( sup \C n (B r ) -C n (B s )\>e)< S. 

Fix m > 1, b\, . . . , b m G M and define R n = Y^k=i bklB k O^n)- Since (R n :n> 1) is c.i.d., 
arguing exactly as in Example 3.5 of [6], one obtains 

jTb k c n ( Bk ) = ^tAR>-E(R n+1 \g n )} _^ M L ^ hb \ stably 

fe=l v" V k,j ' 
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Since b\,...,b m are arbitrary, (i) holds. To check (ii), given e, 8 > 0, take m such that 



where c is the constant involved in (6). By what has already been proven, 
P( sup \C n {B r ) - C n {B s )\ >e)< P(2 sup |C n (fl r )| > e) 



< 



P(2£;(sup |W„(Br)||0n) > e) < 4^(sup iy„(B r ) 2 } 



<% P(Yx€ \jB r )<5. 
Thus, (ii) holds and this completes the proof. □ 
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